Efficient Algorithms to Execute Complex Similarity Queries in RDBMS
نویسندگان
چکیده
Search operations in large sets of complex objects usually rely on similarity-based criteria, due to the lack of other general properties that could be used to compare the objects, such as the total order relationship, or even the equality relationship between pairs of objects, commonly used with data in numeric or short texts domains. Therefore, similarity between objects is the core criterion to compare complex objects. There are two basic operators for similarity queries: Range Query and k-Nearest Neighbors Query. Much research has been done to develop effective algorithms to implement them as standalone operations. However, algorithms to support these operators as parts of more complex expressions involving their composition were not developed yet. This paper presents two new algorithms specially designed to answer conjunctive and disjunctive operations involving the basic similarity criteria, providing also support for the manipulation of tie lists when the k-Nearest Neighbor query is involved. The new proposed algorithms were compared with the combinations of the basic algorithms, both in the sequential scan and in the Slim-tree metric access methods, measuring the number of disk accesses, the number of distance calculations, and wall-clock time. The experimental results show that the new algorithms have better performance than the composition of the two basic operators to answer complex similarity queries in all measured aspects, being up to 40 times faster than the composition of the basic algorithms. This is an essential point to enable the practical use of similarity operators in Relational Database Management Systems.
منابع مشابه
Identifying Algebraic Properties to Support Optimization of Unary Similarity Queries
Conventional operators for data retrieval are either based on exact matching or on total order relationship among elements. Neither of them is appropriate to manage complex data, such as multimedia data, time series and genetic sequences. In fact, the most meaningful way to compare complex data is by similarity. However, the Relational Algebra, employed in the Relational Database Management Sys...
متن کاملGraph data management for molecular and cell biology
As high-throughput biology begins to generate large volumes of systems biology data, the need grows for robust, efficient database systems to support investigations of metabolic and signaling pathways, chemical reaction networks, gene regulatory networks, and protein interaction networks. Network data is frequently represented as graphs, and researchers need to navigate, query and manipulate th...
متن کاملApproximation-Based Similarity Search for 3-D Surface Segments
The issue of ®nding similar 3-D surface segments arises in many recent applications of spatial database systems, such as molecular biology, medical imaging, CAD, and geographic information systems. Surface segments being similar in shape to a given query segment are to be retrieved from the database. The two main questions are how to de®ne shape similarity and how to ef®ciently execute similari...
متن کاملEfficient Processing of RDF Queries with Nested Optional Graph Patterns in an RDBMS
Relational technology has shown to be very useful for scalable Semantic Web data management. Numerous researchers have proposed to use RDBMSs to store and query voluminous RDF data using SQL and RDF query languages. In this article, we study how RDF queries with the socalled well-designed graph patterns and nested optional patterns can be efficiently evaluated in an RDBMS. We propose to extend ...
متن کاملRankSQL: Supporting Ranking Queries in Relational Database Management Systems
Ranking queries (or top-k queries) are dominant in many emerging applications, e.g., similarity queries in multimedia databases, searching Web databases, middleware, and data mining. The increasing importance of top-k queries warrants an efficient support of ranking in the relational database management system (RDBMS) and has recently gained the attention of the research community. Top-k querie...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Braz. Comp. Soc.
دوره 9 شماره
صفحات -
تاریخ انتشار 2004